Barrier Synchronization Pattern

نویسندگان

  • Rajesh K. Karmani
  • Nicholas Chen
چکیده

Parallel algorithms divide the work into multiple, concurrent tasks. These tasks or UEs may execute in parallel depending on the physical resources available. It is common for UEs to proceed in phases where the next phase cannot start until all UEs complete the previous phase. This is typically due to mutual dependency on the data written during the previous phase by concurrent UEs. Since UEs may execute at different speeds, there is a need for UEs to wait for one another before proceeding to the next phase. Barriers are commonly used to enforce such waiting. Figure 1 illustrates how a barrier works. A UE executes its code until it reaches a barrier. Then it waits until all other UEs have reached that barrier before proceeding. Consider the Barnes-Hut [BH86] N-body simulation algorithm. This is an iterative algorithm with well-defined phases: building the octree, calculating the forces between bodies, updating the positions and velocities of each body. One way to parallelize the algorithm is to have multiple UEs perform the three different phases. However, no UE can proceed to the next phase until all UEs complete executing the previous phase. After all, it does not make sense to update the position when some UEs are still calculating the forces between bodies. A barrier where all UEs wait for each other to reach the barrier before continuing with their respective computation, is called a global barrier. We distinguish a global barrier from another kind of barrier called local barrier, where a parent task waits for all the child tasks to finish before it can continue.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels

We describe the design and implementation of methods to support reasoning about data races in GPU kernels where constructs other than the standard barrier primitive are used for synchronization. At one extreme we consider kernels that exploit implicit, coarse-grained synchronization between threads in the same warp, a feature provided by many architectures. At the other extreme we consider kern...

متن کامل

Barrier Synchronization on a Loaded SMP Using Two-Phase Waiting Algorithms

Little work has been done on the performance of barrier synchronization using two-phase blocking, as the common wisdom is that it is useless to spin if the total number of threads in the system exceeds the number of processors. We challenge this view and show that it may be beneficial to spin-wait if the spinning period is set to be a bit more than twice the context switch overhead (rather than...

متن کامل

Area and Performance Optimization of Barrier Synchronization on Multi-core Network-on-Chips

Barrier synchronization is commonly and widely used to synchronize the execution of parallel processor cores on multi-core Network-on-Chips (NoCs). Since its global nature may cause heavy serialization resulting in large performance penalty, barrier synchronization should be carefully designed to have low latency communication and to minimize overall completion time. Therefore, in the paper, we...

متن کامل

ASYNC Loop Constructs for Relaxed Synchronization

Conventional iterative solvers for partial differential equations impose strict data dependencies between each solution point and its neighbors. When implemented in OpenMP, they repeatedly execute barrier synchronization in each iterative step to ensure that data dependencies are strictly satisfied. We propose new parallel annotations to support an asynchronous computation model for iterative s...

متن کامل

Fast Barrier Synchronization in Wormhole k-ary n-cube Networks with Multidestination Worms1

This paper presents a new approach to implement fast barrier synchronization in wormhole k-ary n-cubes. The novelty lies in using multidestination messages instead of the traditional single destination messages. Two diierent multidestination worm types, gather and broadcasting, are introduced to implement the report and wake-up phases of barrier synchronization , respectively. Algorithms for co...

متن کامل

Fast Barrier Synchronization on Shared Fast Ethernet

Shared LAN is presently the most widespread networking technology, due to its extremely low cost and favourable cost/performance ratio. Clusters of Personal Computers (PCs) leveraging shared 100base-T Ethernet may currently ooer the best price/performance in parallel processing. Most numerical parallel algorithms make heavy use of collective communications and especially barrier synchronization...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009